Skip to content

improve Simpleupdate performances#361

Draft
ogauthe wants to merge 31 commits intoQuantumKitHub:masterfrom
ogauthe:simpleupdate
Draft

improve Simpleupdate performances#361
ogauthe wants to merge 31 commits intoQuantumKitHub:masterfrom
ogauthe:simpleupdate

Conversation

@ogauthe
Copy link
Copy Markdown
Contributor

@ogauthe ogauthe commented Apr 16, 2026

This is a work in progress to improve SimpleUpdate performances.

Context: SimpleUpdate is quite slow, especially for 2nd neighbor interaction. I had cases where it became the bottleneck, above CTMRG. I started working on improving the performances last week. I realized many parts could be ported upstream.

I followed 4 different strategies:

  • improving type stability
  • removing copies where I think they could be avoided
  • removing intermediate normalization
  • removing permutations by simplifying perm1∘ perm2

This is a work in progress. It already shows significant speed-up for e.g. finite temperature J1-J2 with SU(2) symmetry. I know it is possible to do much better. I just saw #360 which has overlap for _get_cluster_trunc and I realized it may be relevant to mention my work (objective is not to block any part of #360)

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 53.39806% with 96 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/algorithms/time_evolution/simpleupdate3site.jl 0.00% 60 Missing ⚠️
src/algorithms/time_evolution/apply_mpo.jl 0.00% 23 Missing ⚠️
src/algorithms/time_evolution/simpleupdate.jl 86.48% 10 Missing ⚠️
src/algorithms/contractions/absorb_weight.jl 95.12% 2 Missing ⚠️
src/algorithms/time_evolution/apply_gate.jl 87.50% 1 Missing ⚠️
Files with missing lines Coverage Δ
src/PEPSKit.jl 100.00% <ø> (ø)
src/algorithms/time_evolution/time_evolve.jl 70.00% <ø> (-22.50%) ⬇️
src/environments/suweight.jl 48.43% <ø> (-42.54%) ⬇️
src/algorithms/time_evolution/apply_gate.jl 74.46% <87.50%> (-18.87%) ⬇️
src/algorithms/contractions/absorb_weight.jl 95.12% <95.12%> (ø)
src/algorithms/time_evolution/simpleupdate.jl 79.85% <86.48%> (-15.27%) ⬇️
src/algorithms/time_evolution/apply_mpo.jl 0.00% <0.00%> (-100.00%) ⬇️
src/algorithms/time_evolution/simpleupdate3site.jl 21.31% <0.00%> (-70.69%) ⬇️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@Yue-Zhengyuan Yue-Zhengyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for improving SU! Left some initial comments.

Comment thread src/algorithms/time_evolution/simpleupdate.jl
Comment thread src/algorithms/time_evolution/simpleupdate.jl Outdated
(d, r, c), = _nn_bondrev(sites..., (Nr, Nc))
alg.bipartite && r > 1 && continue
ϵ′ = _su_iter!(state2, gate, env2, sites, alg)
ϵ′ = _su_iter_gate!(state2, gate, env2, sites[1], sites[2], alg)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I wanted _su_iter! to be able to handle both AbstractTensorMap{E, S, 2, 2} and length-2 MPO gates, though this is mainly for test purposes to ensure that MPO gates are applied correctly even for fermions. Is it possible to recover this behavior without loss of performance?

Copy link
Copy Markdown
Contributor Author

@ogauthe ogauthe Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I now see the point of using dispatch. There is a traedoff here between code readibility/efficiency (gate is much faster) and this test feature. Is this something that is currently tested?

[copy from above] In su_iter, we need an if statement to distinguish nearest neighbor from long distance at run time. Therefore my idea was that dispatch does not bring much here, while using different function names makes it explicit and easier to find which function/method is called.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that is currently tested?

It is tested with Hubbard model in test/timeevol/cluster_projectors.jl.

gate is much faster

The MPO way can still be optimized. When applying the gate MPO on the first and the last cluster sites, we can also use the trick of reduced bond tensors, which has not been done yet. Even for longer-range (e.g. NNN) bonds, this can still be useful. Separating out two unitary tensors at the two ends of an OBC-MPO will not affect finding the Vidal gauge.

We can try this out in a follow-up PR if you prefer not to do too much here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will revert to dispatch. Keeping this conversation, closing the other relative to this one.

Comment thread src/algorithms/time_evolution/simpleupdate3site.jl
@Yue-Zhengyuan
Copy link
Copy Markdown
Member

Yue-Zhengyuan commented Apr 19, 2026

There is another recent change that further slowed down SU for 2nd neighbor interaction. After introducing LocalCircuit in #347 (and its predecessor TrotterGates in #339), each term in the Hamiltonian is individually exponentiated to a gate in the LocalCircuit. Nearest neighbor and next-nearest neighbor terms are no longer grouped together when exponentiating, resulting in an increase in the number of Trotter gates to be applied.

Comment thread src/algorithms/time_evolution/simpleupdate.jl Outdated
Comment thread src/algorithms/time_evolution/simpleupdate3site.jl Outdated
Comment thread src/environments/suweight.jl
Comment thread test/timeevol/j1j2_finiteT.jl Outdated
Copy link
Copy Markdown
Member

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments, will try and actually look into this in detail next week :)

Comment thread src/algorithms/time_evolution/simpleupdate3site.jl Outdated
```
"""
function _qr_bond(A::PT, B::PT; gate_ax::Int = 1, kwargs...) where {PT <: Union{PEPSTensor, PEPOTensor}}
function _qr_bond(A::PT, B::PT; gate_ax::Integer = 1, kwargs...) where {PT <: Union{PEPSTensor, PEPOTensor}}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these functions, both for type stability and human readability, I would really prefer to just use dispatch and write out the two cases manually. _get_biperms is an indirection that doesn't hurt because the compiler decides to inline this, but depending on compiler-internals like this tends to not be great, and honestly I think this function would overall be better suited to simply be two functions with using dispatch and hardcoding the permutations (which could then use @tensor calls for the permutations, which is just more readable in general.

Copy link
Copy Markdown
Member

@Yue-Zhengyuan Yue-Zhengyuan Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See latest changes in #144 where I replaced _qr_bond with multiple bond_tensor_xxx functions (link).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogauthe You can have a look at related parts of #144 and port them here if you wish.

Comment thread src/algorithms/time_evolution/apply_mpo.jl Outdated
Comment thread src/algorithms/time_evolution/apply_mpo.jl Outdated
Comment thread src/algorithms/contractions/absorb_weight.jl
(d, r, c), = _nn_bondrev(sites..., (Nr, Nc))
alg.bipartite && r > 1 && continue
ϵ′ = _su_iter!(state2, gate, env2, sites, alg)
ϵ′ = _su_iter_gate!(state2, gate, env2, sites[1], sites[2], alg)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that is currently tested?

It is tested with Hubbard model in test/timeevol/cluster_projectors.jl.

gate is much faster

The MPO way can still be optimized. When applying the gate MPO on the first and the last cluster sites, we can also use the trick of reduced bond tensors, which has not been done yet. Even for longer-range (e.g. NNN) bonds, this can still be useful. Separating out two unitary tensors at the two ends of an OBC-MPO will not affect finding the Vidal gauge.

We can try this out in a follow-up PR if you prefer not to do too much here.

```
"""
function _qr_bond(A::PT, B::PT; gate_ax::Int = 1, kwargs...) where {PT <: Union{PEPSTensor, PEPOTensor}}
function _qr_bond(A::PT, B::PT; gate_ax::Integer = 1, kwargs...) where {PT <: Union{PEPSTensor, PEPOTensor}}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogauthe You can have a look at related parts of #144 and port them here if you wish.

@Yue-Zhengyuan
Copy link
Copy Markdown
Member

@ogauthe Do you mind if I temporarily take over to merge upstream changes and fix tests?

@ogauthe
Copy link
Copy Markdown
Contributor Author

ogauthe commented Apr 30, 2026

@ogauthe Do you mind if I temporarily take over to merge upstream changes and fix tests?

I just merged master, go ahead if there are other changes you plan for.

@Yue-Zhengyuan Yue-Zhengyuan self-assigned this Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants